Learning with Class Skews and Small Disjuncts

نویسندگان

Ronaldo C. Prati

Gustavo E. A. P. A. Batista

Maria Carolina Monard

چکیده

One of the main objectives of a Machine Learning – ML – system is to induce a classifier that minimizes classification errors. Two relevant topics in ML are the understanding of which domain characteristics and inducer limitations might cause an increase in misclassification. In this sense, this work analyzes two important issues that might influence the performance of ML systems: class imbalance and errorprone small disjuncts. Our main objective is to investigate how these two important aspects are related to each other. Aiming at overcoming both problems we analyzed the behavior of two over-sampling methods we have proposed, namely Smote + Tomek links and Smote + ENN. Our results suggest that these methods are effective for dealing with class imbalance and, in some cases, might help in ruling out some undesirable disjuncts. However, in some cases a simpler method, Random over-sampling, provides compatible results requiring less computational resources.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning with Rare Cases and Small Disjuncts

Systems that learn from examples often create a disjunctive concept definition. Small disjuncts are those disjuncts which cover only a few training examples. The problem with small disjuncts is that they are more error prone than large disjuncts. This paper investigates the reasons why small disjuncts are more error prone than large disjuncts. It shows that when there are rare cases within a do...

متن کامل

The Impact of Small Disjuncts on Classifier Learning

Many classifier induction systems express the induced classifier in terms of a disjunctive description. Small disjuncts are those disjuncts that classify few training examples. These disjuncts are interesting because they are known to have a much higher error rate than large disjuncts and are responsible for many, if not most, of all classification errors. Previous research has investigated thi...

متن کامل

Reducing the Small Disjuncts Problem by Learning Probabilistic Concept Descriptions 10.1 Introduction

Concept learners that learn concept descriptions consisting of rules have been shown to be prone to the small disjuncts problem (Holte et al., 1989). This is the problem where a large proportion of the overall classi cation error made by the concept description on an independent test set can be attributed to rules which were true for a small number of training examples. In noisy domains, such c...

متن کامل

Diversifying Support Vector Machines for Boosting using Kernel Perturbation: Applications to Class Imbalance and Small Disjuncts

The diversification (generating slightly varying separating discriminators) of Support Vector Machines (SVMs) for boosting has proven to be a challenge due to the strong learning nature of SVMs. Based on the insight that perturbing the SVM kernel may help in diversifying SVMs, we propose two kernel perturbation based boosting schemes where the kernel is modified in each round so as to increase ...

متن کامل

A Quantitative Study of Small Disjuncts

Systems that learn from examples often express the learned concept in the form of a disjunctive description. Disjuncts that correctly classify few training examples are known as small disjuncts and are interesting to machine learning researchers because they have a much higher error rate than large disjuncts. Previous research has investigated this phenomenon by performing ad hoc analyses of a ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Learning with Class Skews and Small Disjuncts

نویسندگان

چکیده

منابع مشابه

Learning with Rare Cases and Small Disjuncts

The Impact of Small Disjuncts on Classifier Learning

Reducing the Small Disjuncts Problem by Learning Probabilistic Concept Descriptions 10.1 Introduction

Diversifying Support Vector Machines for Boosting using Kernel Perturbation: Applications to Class Imbalance and Small Disjuncts

A Quantitative Study of Small Disjuncts

عنوان ژورنال:

اشتراک گذاری